8 research outputs found
Data Structure Lower Bounds for Document Indexing Problems
We study data structure problems related to document indexing and pattern
matching queries and our main contribution is to show that the pointer machine
model of computation can be extremely useful in proving high and unconditional
lower bounds that cannot be obtained in any other known model of computation
with the current techniques. Often our lower bounds match the known space-query
time trade-off curve and in fact for all the problems considered, there is a
very good and reasonable match between the our lower bounds and the known upper
bounds, at least for some choice of input parameters. The problems that we
consider are set intersection queries (both the reporting variant and the
semi-group counting variant), indexing a set of documents for two-pattern
queries, or forbidden- pattern queries, or queries with wild-cards, and
indexing an input set of gapped-patterns (or two-patterns) to find those
matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016,
25 page
Applications of incidence bounds in point covering problems
In the Line Cover problem a set of n points is given and the task is to cover
the points using either the minimum number of lines or at most k lines. In
Curve Cover, a generalization of Line Cover, the task is to cover the points
using curves with d degrees of freedom. Another generalization is the
Hyperplane Cover problem where points in d-dimensional space are to be covered
by hyperplanes. All these problems have kernels of polynomial size, where the
parameter is the minimum number of lines, curves, or hyperplanes needed. First
we give a non-parameterized algorithm for both problems in O*(2^n) (where the
O*(.) notation hides polynomial factors of n) time and polynomial space,
beating a previous exponential-space result. Combining this with incidence
bounds similar to the famous Szemeredi-Trotter bound, we present a Curve Cover
algorithm with running time O*((Ck/log k)^((d-1)k)), where C is some constant.
Our result improves the previous best times O*((k/1.35)^k) for Line Cover
(where d=2), O*(k^(dk)) for general Curve Cover, as well as a few other bounds
for covering points by parabolas or conics. We also present an algorithm for
Hyperplane Cover in R^3 with running time O*((Ck^2/log^(1/5) k)^k), improving
on the previous time of O*((k^2/1.3)^k).Comment: SoCG 201
Audio Quality Assurance: An Application of Cross Correlation: Paper - iPRES 2012 - Digital Curation Institute, iSchool, Toronto
We describe algorithms for automated quality assurance on content of audio files in context of preservation actions and access. The algorithms use cross correlation to compare the sound waves. They are used to do overlap analysis in an access scenario, where preserved radio broadcasts are used in research and annotated. They have been applied in a mi- gration scenario, where radio broadcasts are to be migrated for long term preservation. This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137)
Top-k Term-Proximity in Succinct Space
LetD={T1,T2,...,TD}be a collection ofDstring doc-uments ofncharacters in total, that are drawn from an alphabet setΣ= [σ]. Thetop-kdocument retrieval problemis to preprocessDintoa data structure that, given a query (P[1..p],k), can return thekdocu-ments ofDmost relevant to patternP. The relevance is captured usinga predefined ranking function, which depends on the set of occurrencesofPinTd. For example, it can be the term frequency (i.e., the num-ber of occurrences ofPinTd), or it can be the term proximity (i.e., thedistance between the closest pair of occurrences ofPinTd), or a pattern-independent importance score ofTdsuch as PageRank. Linear space andoptimal query time solutions already exist for this problem. Compressedand compact space solutions are also known, but only for a few rank-ing functions such as term frequency and importance. However, spaceefficient data structures for term proximity based retrieval have beenevasive. In this paper we present the first sub-linear space data structurefor this relevance function, which uses onlyo(n) bits on top of any com-pressed suffix array ofDand solves queries in timeO((p+k) polylogn)